356 research outputs found
Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It
We empirically show that Bayesian inference can be inconsistent under
misspecification in simple linear regression problems, both in a model
averaging/selection and in a Bayesian ridge regression setting. We use the
standard linear model, which assumes homoskedasticity, whereas the data are
heteroskedastic, and observe that the posterior puts its mass on ever more
high-dimensional models as the sample size increases. To remedy the problem, we
equip the likelihood in Bayes' theorem with an exponent called the learning
rate, and we propose the Safe Bayesian method to learn the learning rate from
the data. SafeBayes tends to select small learning rates as soon the standard
posterior is not `cumulatively concentrated', and its results on our data are
quite encouraging.Comment: 70 pages, 20 figure
A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity
We present a novel notion of complexity that interpolates between and
generalizes some classic existing complexity notions in learning theory: for
estimators like empirical risk minimization (ERM) with arbitrary bounded
losses, it is upper bounded in terms of data-independent Rademacher complexity;
for generalized Bayesian estimators, it is upper bounded by the data-dependent
information complexity (also known as stochastic or PAC-Bayesian,
complexity. For
(penalized) ERM, the new complexity reduces to (generalized) normalized maximum
likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence
regret. Our first main result bounds excess risk in terms of the new
complexity. Our second main result links the new complexity via Rademacher
complexity to entropy, thereby generalizing earlier results of Opper,
Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with .
Together, these results recover optimal bounds for VC- and large (polynomial
entropy) classes, replacing localized Rademacher complexity by a simpler
analysis which almost completely separates the two aspects that determine the
achievable rates: 'easiness' (Bernstein) conditions and model complexity.Comment: 38 page
Almost the Best of Three Worlds: Risk, Consistency and Optional Stopping for the Switch Criterion in Nested Model Selection
We study the switch distribution, introduced by Van Erven et al. (2012),
applied to model selection and subsequent estimation. While switching was known
to be strongly consistent, here we show that it achieves minimax optimal
parametric risk rates up to a factor when comparing two nested
exponential families, partially confirming a conjecture by Lauritzen (2012) and
Cavanaugh (2012) that switching behaves asymptotically like the Hannan-Quinn
criterion. Moreover, like Bayes factor model selection but unlike standard
significance testing, when one of the models represents a simple hypothesis,
the switch criterion defines a robust null hypothesis test, meaning that its
Type-I error probability can be bounded irrespective of the stopping rule.
Hence, switching is consistent, insensitive to optional stopping and almost
minimax risk optimal, showing that, Yang's (2005) impossibility result
notwithstanding, it is possible to `almost' combine the strengths of AIC and
Bayes factor model selection.Comment: To appear in Statistica Sinic
Optional Stopping with Bayes Factors: a categorization and extension of folklore results, with an application to invariant situations
It is often claimed that Bayesian methods, in particular Bayes factor methods
for hypothesis testing, can deal with optional stopping. We first give an
overview, using elementary probability theory, of three different mathematical
meanings that various authors give to this claim: (1) stopping rule
independence, (2) posterior calibration and (3) (semi-) frequentist robustness
to optional stopping. We then prove theorems to the effect that these claims do
indeed hold in a general measure-theoretic setting. For claims of type (2) and
(3), such results are new. By allowing for non-integrable measures based on
improper priors, we obtain particularly strong results for the practically
important case of models with nuisance parameters satisfying a group invariance
(such as location or scale). We also discuss the practical relevance of
(1)--(3), and conclude that whether Bayes factor methods actually perform well
under optional stopping crucially depends on details of models, priors and the
goal of the analysis.Comment: 29 page
Beyond Neyman-Pearson
A standard practice in statistical hypothesis testing is to mention the
p-value alongside the accept/reject decision. We show the advantages of
mentioning an e-value instead. With p-values, we cannot use an extreme
observation (e.g. ) for getting better frequentist decisions.
With e-values it is straightforward, since they provide Type-I risk control in
a generalized Neyman-Pearson setting with the decision task (a general loss
function) determined post-hoc, after observation of the data -- thereby
providing a handle on `roving 's'. When Type-II risks are taken into
consideration, the only admissible decision rules in the post-hoc setting turn
out to be e-value-based. Similarly, if the loss incurred when specifying a
faulty confidence interval is not fixed in advance, standard confidence
intervals and distributions may fail whereas e-confidence sets and e-posteriors
still provide valid risk guarantees.Comment: Second, thoroughly revised version. Part of the material in the first
version has moved to another paper, "The E-Posterior", to appear in Phil.
Trans. Royal Soc. of London Series
Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning
We consider online learning algorithms that guarantee worst-case regret rates
in adversarial environments (so they can be deployed safely and will perform
robustly), yet adapt optimally to favorable stochastic environments (so they
will perform well in a variety of settings of practical importance). We
quantify the friendliness of stochastic environments by means of the well-known
Bernstein (a.k.a. generalized Tsybakov margin) condition. For two recent
algorithms (Squint for the Hedge setting and MetaGrad for online convex
optimization) we show that the particular form of their data-dependent
individual-sequence regret guarantees implies that they adapt automatically to
the Bernstein parameters of the stochastic environment. We prove that these
algorithms attain fast rates in their respective settings both in expectation
and with high probability
Contextuality of misspecification and data-dependent losses
Analysis and Stochastic
The minimum description length principle
The pdf file in the repository consists only if the preface, foreword and chapter 1; I am not allowed by the publisher to put the remainder of this book on the web.
If you are a member of the CWI evaluation committee and yu read this: you are of course entitled to access the full book. If you would like to see it, please contact CWI (or, even easier, contact me directly), and we will be happy to give you a copy of the book for free
- âŠ